Simplifying Text Processing with Grammatically Aware Regular Expressions

نویسندگان

  • Rafal Rzepka
  • Tyson Roberts
  • Kenji Araki
چکیده

In our paper we introduce Grammatically Aware Regular expression (GARE) and describe its usage using examples from moral consequences retrieval task. GARE is an extension to the regular expression concept that overcomes many of the difficulties with traditional regexp by adding Normalization (e.g., searching all grammatical forms with basic form of a verb or adjective is possible) or POS awareness (e.g. searching only for adjectives after “wa” particle is possible). We explain how it works, what makes it more expressive for natural language, and how it solves a number of matching cases that traditional regular expressions cannot solve on their own.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentiment Mining and Indexing in Opinmind

This paper presents a production system that efficiently mines social networking sites for sentiments and indexes the expressions for fast retrieval via a web search interface. Sentiment mining is a computational approach used to identify expressions made about topics within a span of text. Social networks represent a particularly rich corpus for mining sentiments because writers express sentim...

متن کامل

Optimally Streaming Greedy Regular Expression Parsing

We study the problem of streaming regular expression parsing: Given a regular expression and an input stream of symbols, how to output a serialized syntax tree representation as an output stream during input stream processing. We show that optimally streaming regular expression parsing, outputting bits of the output as early as is semantically possible for any regular expression of size m and a...

متن کامل

Simplifying Regular Expressions: A Quantitative Perspective

In this work, we consider the efficient simplification of regular expressions. We suggest a quantitative comparison of heuristics for simplifying regular expressions. We propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. We apply this normal form to determine an exact bound for the relation between the two most c...

متن کامل

Domain Specific Text Processing for Speech Synthesis

In Text-to-Speech (TTS) synthesis there are words and expressions that pose problems because some semantic knowledge is required to determine how they should be read out. This work implements a domain filter, a pre-processing module that supports the TTS system by analysing text belonging to a certain semantic domain and rewriting problematic expressions so that they are read out better. The fi...

متن کامل

Simplifying Regular Expressions

We consider the efficient simplification of regular expressions and suggest a quantitative comparison of heuristics for simplifying regular expressions. To this end, we propose a new normal form for regular expressions, which outperforms previous heuristics while still being computable in linear time. This allows us to determine an exact bound for the relation between the two prevalent measures...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012